Conversation
…imputer singledispatched
eroell
left a comment
There was a problem hiding this comment.
Excellent as always!
Next to the comments, a question of interest:
Could you take physionet2012's 3D data, mask ~10% of the values, and impute them with a) missforest b) simple mean impute, and then compare which one is more accurate by comparing the mean of the imputed values with their true mean, and the standard deviation of the imputed values with their true standard deviation?
A code snippet doing so just for the PR comment would be great!
| max_iter=max_iter, | ||
| random_state=random_state, | ||
| ) | ||
| # not sure if this should be kept? |
There was a problem hiding this comment.
this can be removed indeed, you're defining the Imputer in the single-dispatch.
further, there is in Line 568 an unused definition of RandomForestClassifier which you can also throw out.
| "var_names parameter." | ||
| ) | ||
| mtx = edata.X if layer is None else edata.layers[layer] | ||
| input_dtype = mtx.dtype if np.issubdtype(mtx.dtype, np.floating) else np.float64 |
There was a problem hiding this comment.
Could you add a quick comment on why this is needed here? :)
| miss_forest_impute(edata_blob_small, layer="layer_2") | ||
| with pytest.raises(ValueError, match=r"only supports 2D data"): | ||
| miss_forest_impute(edata_blob_small, layer=DEFAULT_TEM_LAYER_NAME) | ||
| @pytest.mark.parametrize("edata_mini_3D_missing_values", [True], indirect=True) |
There was a problem hiding this comment.
Could you add for the "basic" test a parametrization which also checks the array types, where dask raises a valueerror is checked, and it is also checked that this works with sparse (at least in the 2D case then)?
|
|
||
|
|
||
| @_miss_forest_impute_function.register(np.ndarray) | ||
| @_miss_forest_impute_function.register(sp.csr_array) |
There was a problem hiding this comment.
I think it is legit to consider sparse arrays for imputations and make them dense.
Could you mention this in the function docstring?
fixes #948
Extends
miss_forest_imputeto handle 3DEHRDatainputs@function_2D_only()decorator(n_obs * n_t, n_vars)(flattening along axis 0) before imputation and reshapes back to(n_obs, n_vars, n_t)afterwardsValueErrorwhen input is 3D but no layer is specified